A maximum likelihood equalization technique for robust speech recognition in adverse environments
نویسنده
چکیده
In this paper, we study the problem of robust speech recognition in adverse environments. We focus our attention to the following two types of distortions: 1) the additive noise distortion, 2) the channel mismatch distortion. The maximum likelihood (ML) equalization technique is used to compensate for these distortions. Performance of the ML technique is compared with the following channel equalization techniques: the global mean subtraction (GMS) technique, the local mean subtraction (LMS) technique, the finite impulse response (FIR) highpass filtering technique, the infinite impulse response (IIR) highpass filtering technique, the RASTA (bandpass) filtering technique, and the masking-based filtering technique. These techniques have been recently proposed in the literature and are computationally much simpler than the ML equalization technique. It is shown that the ML equalization technique does not offer any significant advantage over the other channel equalization techniques in terms of recognition performance.
منابع مشابه
Improved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition
Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...
متن کاملReduced complexity equalization of lombard effect for speech recognition in noisy adverse environments
In real-world adverse environments, speech signal corruption by background noise, microphone channel variations, and speech production adjustments introduced by speakers in an effort to communicate efficiently over noise (Lombard effect) severely impact automatic speech recognition (ASR) performance. Recently, a set of unsupervised techniques reducing ASR sensitivity to these sources of distort...
متن کاملRobust Speech Features and Acoustic Models for Speech Recognition
This thesis examines techniques to improve the robustness of automatic speech recognition (ASR) systems against noise distortions. The study is important as the performance of ASR systems degrades dramatically in adverse environments, and hence greatly limits the speech recognition application deployment in realistic environments. Towards this end, we examine a feature compensation approach and...
متن کاملAttribute-based histogram equalization (HEQ) and its adaptation for robust speech recognition
Histogram equalization (HEQ) is a simple and effective feature normalization technique for robust speech recognition. Recently, we proposed to adapt HEQ transform to each test utterance using a maximum likelihood (ML) criterion and observed improved performance. In this paper, we further the study by applying attribute-based HEQ and its ML adaptation. Instead of applying a global HEQ transform ...
متن کاملClassification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کامل